IFT6758-A24 Blog A presentation blog of our data science project about NHL

Milestone 1

Acquisition de données

Question 1 :

Description du paquet

Ce paquet est responsable de la récupération et du stockage des données provenant de l’API de la LNH.

Il utilise la classe ApiClient pour récupérer les données et la classe FileSystemCache pour les stocker.

Il stocke deux types de données : le cache et le dump.

Lors de la récupération des données, le cache est utilisé pour stocker les données temporairement. Cela permet d’exécuter le processus de récupération en plusieurs étapes sans avoir à récupérer les mêmes données plusieurs fois. Le cache est stocké dans le dossier ift6758/data/storage/cache. Ce chemin peut être modifié avec la variable d’environnement CACHE_PATH.

Une fois les données d’une saison récupérées, elles sont stockées dans le dump. À la fin du processus de récupération, il est conseillé de vider le cache pour libérer de l’espace. Le dump est stocké dans le dossier ift6758/data/storage/dump. Ce chemin peut être modifié avec la variable d’environnement DUMP_PATH.

Utilisation : Récupérer les données depuis l’API

from ift6758.data import fetch_all_seasons_games_data

fetch_all_seasons_games_data()

Cela récupérera tous les matchs depuis l’API de la LNH et les stockera dans ift6758/data/storage/dump.

Si vous devez libérer de l’espace, il peut être nécessaire de vider le cache :

from ift6758.data import clear_cache

clear_cache()

Vous pouvez également supprimer le dossier ift6758/data/storage/cache.

Charger les données aplaties dans un DataFrame

Cela chargera les données depuis le dump et les aplatis dans un DataFrame.

from ift6758.data import load_events_dataframe

df_2020 = load_events_dataframe(2020)
df_all = load_events_dataframe()

Utilisation avancée

from ift6758.data import (ApiClient, FileSystemCache, DataTransformer, GameType)
import os
import json

cache_path = os.path.dirname(os.path.abspath(__file__)) + "/storage/cache"
cache = FileSystemCache(cache_path)

dump_path = os.path.dirname(os.path.abspath(__file__)) + "/storage/dump"
dump = FileSystemCache(dump_path)

client = ApiClient(cache)

data_transformer = DataTransformer()

data = client.get_games_data(2020, [GameType.REGULAR, GameType.PLAYOFF])
dump.set("2020", json.dumps(data, indent=2))

df = data_transformer.flatten_raw_data_as_dataframe(data)
records = data_transformer.flatten_raw_data_as_records(data)

Outil de débogage interactif

Question 1 :

Capture d’écran

image

Code de l’outil

import os
import json
import ipywidgets as widgets
from IPython.display import clear_output, display
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
data_folder = '../ift6758/data/storage/dump/'
rink_image_path = '../figures/nhl_rink.png'

rink_width = 800
rink_height = 400

rink_x_min, rink_x_max = -100, 100
rink_y_min, rink_y_max = -42.5, 42.5
def load_season_data(season):
    file_path = os.path.join(data_folder, f"{season}.json")
    with open(file_path, 'r') as f:
        data = json.load(f)
    return data

def filter_games_by_type(data, season_type):
    """
    Filters the game data by regular season ('02') or playoffs ('03').
    It checks the 5th and 6th characters of the game ID to determine the game type.
    """
    filtered_games = [game for game in data if str(game['id'])[4:6] == season_type]
    return filtered_games

def display_event_info(game_data, event_index):
    event_data = game_data['plays'][event_index]
    event_id = event_data['eventId']
    period = event_data['periodDescriptor']['number']
    time_in_period = event_data['timeInPeriod']
    event_type = event_data['typeDescKey']
    
    with event_output:
        clear_output(wait=True)
        print(f"Event ID: {event_id}")
        print(f"Period: {period}")
        print(f"Time in Period: {time_in_period}")
        print(f"Event Type: {event_type}")
        
        if 'details' in event_data:
            details = event_data['details']
            if 'xCoord' in details and 'yCoord' in details:
                print(f"Event Position: x={details['xCoord']}, y={details['yCoord']}")
                transformed_x, transformed_y = transform_coordinates(details['xCoord'], details['yCoord'])
                display_rink_image(transformed_x, transformed_y)
            else:
                display_rink_image()

            if 'reason' in details:
                print(f"Reason: {details['reason']}")
            if 'winningPlayerId' in details:
                print(f"Winning Player ID: {details['winningPlayerId']}")
            if 'losingPlayerId' in details:
                print(f"Losing Player ID: {details['losingPlayerId']}")
            if 'shootingPlayerId' in details:
                print(f"Shooting Player ID: {details['shootingPlayerId']}")
            if 'goalieInNetId' in details:
                print(f"Goalie in Net ID: {details['goalieInNetId']}")
            if 'hittingPlayerId' in details:
                print(f"Hitting Player ID: {details['hittingPlayerId']}")
            if 'hitteePlayerId' in details:
                print(f"Hittee Player ID: {details['hitteePlayerId']}")
            if 'blockingPlayerId' in details:
                print(f"Blocking Player ID: {details['blockingPlayerId']}")
        else:
            display_rink_image()

        print("\nJSON:")
        print(json.dumps(event_data, indent=4)) 

def display_game_info(season, game_index):
    data = load_season_data(season)
    season_type = season_type_selector.value
    filtered_game_data = filter_games_by_type(data, season_type)
    
    game_data = filtered_game_data[game_index]
    
    with match_output:
        clear_output(wait=True)
        print(f"Game ID: {game_data['id']}")
        print(f"Date: {game_data['gameDate']}")
        print(f"Home Team: {game_data['homeTeam']['name']['default']} (Score: {game_data['homeTeam']['score']})")
        print(f"Away Team: {game_data['awayTeam']['name']['default']} (Score: {game_data['awayTeam']['score']})")
        print(f"Venue: {game_data['venue']['default']} - {game_data['venueLocation']['default']}")
        print(f"Start Time (UTC): {game_data['startTimeUTC']}")
    
    event_slider.max = len(game_data['plays']) - 1
    event_slider.value = 0
    
    def update_event_output(*args):
        display_event_info(game_data, event_slider.value)
    
    event_slider.observe(update_event_output, names='value')
    
    with slider_output:
        clear_output(wait=True)
        display(event_slider)

def transform_coordinates(x, y):
    transformed_x = ((x - rink_x_min) / (rink_x_max - rink_x_min)) * rink_width
    transformed_y = rink_height - ((y - rink_y_min) / (rink_y_max - rink_y_min) * rink_height)
    return transformed_x, transformed_y

def display_rink_image(xCoord=None, yCoord=None):
    fig, ax = plt.subplots(figsize=(8, 4))
    img = mpimg.imread(rink_image_path)
    
    ax.imshow(img, extent=[0, rink_width, 0, rink_height])

    if xCoord is not None and yCoord is not None:
        ax.plot(xCoord, yCoord, 'go', markersize=8, label="Event Position")
        ax.legend()

    ax.set_xlim(0, rink_width)  
    ax.set_ylim(rink_height, 0) 

    ax.axis('off')
    plt.show()

season_selector = widgets.Dropdown(
    options=[2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023], 
    description='Saison:'
)

season_type_selector = widgets.Dropdown(
    options={'All': 'all', 'Regular Season': '02', 'Playoffs': '03'},
    description='Season Type:'
)

game_slider = widgets.IntSlider(
    min=0,
    max=100,
    step=1,
    description='Match:'
)

event_slider = widgets.IntSlider(
    min=0,
    max=10,
    step=1,
    description='Event:'
)

def update_game_slider(*args):
    season = season_selector.value
    season_type = season_type_selector.value
    data = load_season_data(season)
    
    filtered_data = filter_games_by_type(data, season_type) if season_type != 'all' else data
    game_slider.max = len(filtered_data) - 1
    game_slider.value = 0

def update_game_output(*args):
    display_game_info(season_selector.value, game_slider.value)

season_selector.observe(update_game_slider, names='value')
season_type_selector.observe(update_game_slider, names='value')
game_slider.observe(update_game_output, names='value')

match_output = widgets.Output()
slider_output = widgets.Output()
event_output = widgets.Output()

Nettoyer les données

Question 1 :

Capture d’écran de notre Dataframe

Question 2 :

Si le champ de “force” n’existait pas pour les tirs, une manière de le savoir serait de prendre en compte les événements de type “penality”. En effet, en regardant les temps auquels les pénalités sont prises et leurs durées, on peut facilement déduire si un tir a été effectué à force égale, en infériorité ou en supériorité numérique. Une information qui nous est importante pour faire la déduction et le temps auquel le tir à été fait et nous l’avons.

Question 3 :

Caractéristique 1 : Rebond
On pourrait qualifier de rebond tout tir qui est effectué moins de 2 secondes après un premier tir. Évedemment les deux tirs doivent être sur le même gardien. Les rebonds sont souvent des tirs qui ont plus de chances de rentrer dans le but donc il est intéressant de les analyser.

Caractéristique 2 : Tir après mise au jeu
On pourrait qualifier de tir après mise au jeu tout tir qui est effectué moins de 3 secondes après une mise au jeu. Ainsi, il serait possible d’analyser quels sont les éléments nécéssaires pour marquer un but directement après la mise au jeu.

Caractéristique 3 : Tir de pénalité
On pourrait qualifier de tir de pénalité tout tir qui est effectué avec un “situationCode” de 1010 ou de 0101 et si la période actuelle est inférieure à 5. En effet, le situation code indique que c’est seulement un tireur contre un gardien sur la glace et la période inférieure à 5 indique que ce n’est pas un tir de barrage mais bien un tir de pénalité (durant une des trois périodes ou durant la prolongation).

Visualisations simples

Question 1

Graphique qui compare les types de tirs de toutes les équipes dans une saison et discussion.

a)

Un graphique qui montre la relation entre la distance à laquelle un tir est fait et la chance qu’il soit un but. Ce graphique est pour toutes les saisons entre 2018-19 et 2020-21. Discussion.

Question 2

Combiner les deux graphiques ci-dessus en un nouveau graphique et discussion.

Visualisations avancées

Question 1

4 graphiques de zone offensive qui permettent de sélectionner n’importe quelle équipe pour la saison.

Question 2

Discussion

Question 3

Discussion par rapport à l’écolution de l’Avalanche du Colorado.

Question 4

Discussion par rapport aux Sabres de Buffalo et au Lightning de Tampa Bay.

Don’t mind anything after this line. We can remove everything easily, but it contains important information so let’s keep it for now.

IFT6758 Demo Post

This post outlines a few more things you may need to know for creating and configuring your blog posts. If you are interested in more general template features or syntax, you can visit the Introducing Lanyon or the Example Content posts.

Configurations

You should modify some of the default values in _config.yml, found in the root directory of this repo. Things like the title, tagline, description, author information, etc. are all fair game to modify. Be more careful when modifying the url information - things can break if done incorrectly (these are used if you are deploying via Github pages)

Creating Posts

To create a new post in the blog, add a new Markdown file to the _posts/ directory, with the name following the format YYYY-MM-DD-postname.md. Begin the post with the following code:

---
layout: post
title: [POST TITLE]
---

From there, write your content as you would a normal Markdown file. In general, I would recommend writing one sentence per line. This is not required, but this is far easier to work with than having a single giant line of multiple sentences for a single paragraph.

Interactive plots

Here’s how you could embed interactive figures that have been exported as HTML files. Note that we will be using plotly for this demo, but anything that allows you to HTML should work. All that’s required is for you to export your figure into HTML format, and make sure that the file exists in the _includes directory in this repository’s root directory. To embed it into any page, simply insert the following code anywhere into your page.

{% include [FIGURE_NAME].html %} 

For example, the following code can be used to generate the figure underneath it.

import pandas as pd
import plotly.express as px

df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/earthquakes-23k.csv')

fig = px.density_mapbox(df, lat='Latitude', lon='Longitude', z='Magnitude', radius=10,
                        center=dict(lat=0, lon=180), zoom=0,
                        mapbox_style="stamen-terrain")
fig.show()

fig.write_html('./_includes/plotly_demo_1.html')

The above figure is pretty cool, but you can also embed heavier/more complex figures. For brevity, the following figure is generated from the included plotly_html.ipynb notebook file in the repo’s root directory.

Introducing Lanyon

Lanyon is an unassuming Jekyll theme that places content first by tucking away navigation in a hidden drawer. It’s based on Poole, the Jekyll butler.

Built on Poole

Poole is the Jekyll Butler, serving as an upstanding and effective foundation for Jekyll themes by @mdo. Poole, and every theme built on it (like Lanyon here) includes the following:

  • Complete Jekyll setup included (layouts, config, 404, RSS feed, posts, and example page)
  • Mobile friendly design and development
  • Easily scalable text and component sizing with rem units in the CSS
  • Support for a wide gamut of HTML elements
  • Related posts (time-based, because Jekyll) below each post
  • Syntax highlighting, courtesy Pygments (the Python-based code snippet highlighter)

Lanyon features

In addition to the features of Poole, Lanyon adds the following:

  • Toggleable sliding sidebar (built with only CSS) via link in top corner
  • Sidebar includes support for textual modules and a dynamically generated navigation with active link support
  • Two orientations for content and sidebar, default (left sidebar) and reverse (right sidebar), available via <body> classes
  • Eight optional color schemes, available via <body> classes

Head to the readme to learn more.

Browser support

Lanyon is by preference a forward-thinking project. In addition to the latest versions of Chrome, Safari (mobile and desktop), and Firefox, it is only compatible with Internet Explorer 9 and above.

Download

Lanyon is developed on and hosted with GitHub. Head to the GitHub repository for downloads, bug reports, and features requests.

Thanks!

Example content

Howdy! This is an example blog post that shows several types of HTML content supported in this theme.

Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Aenean eu leo quam. Pellentesque ornare sem lacinia quam venenatis vestibulum. Sed posuere consectetur est at lobortis. Cras mattis consectetur purus sit amet fermentum.

Curabitur blandit tempus porttitor. Nullam quis risus eget urna mollis ornare vel eu leo. Nullam id dolor id nibh ultricies vehicula ut id elit.

Etiam porta sem malesuada magna mollis euismod. Cras mattis consectetur purus sit amet fermentum. Aenean lacinia bibendum nulla sed consectetur.

Inline HTML elements

HTML defines a long list of available inline tags, a complete list of which can be found on the Mozilla Developer Network.

  • To bold text, use <strong>.
  • To italicize text, use <em>.
  • Abbreviations, like HTML should use <abbr>, with an optional title attribute for the full phrase.
  • Citations, like — Mark otto, should use <cite>.
  • Deleted text should use <del> and inserted text should use <ins>.
  • Superscript text uses <sup> and subscript text uses <sub>.

Most of these elements are styled by browsers with few modifications on our part.

Heading

Vivamus sagittis lacus vel augue rutrum faucibus dolor auctor. Duis mollis, est non commodo luctus, nisi erat porttitor ligula, eget lacinia odio sem nec elit. Morbi leo risus, porta ac consectetur ac, vestibulum at eros.

Code

Cum sociis natoque penatibus et magnis dis code element montes, nascetur ridiculus mus.

// Example can be run directly in your JavaScript console


// Create a function that takes two arguments and returns the sum of those arguments

var adder = new Function("a", "b", "return a + b");

// Call the function

adder(2, 6);
// > 8

Aenean lacinia bibendum nulla sed consectetur. Etiam porta sem malesuada magna mollis euismod. Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa.

Lists

Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Aenean lacinia bibendum nulla sed consectetur. Etiam porta sem malesuada magna mollis euismod. Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa justo sit amet risus.

  • Praesent commodo cursus magna, vel scelerisque nisl consectetur et.
  • Donec id elit non mi porta gravida at eget metus.
  • Nulla vitae elit libero, a pharetra augue.

Donec ullamcorper nulla non metus auctor fringilla. Nulla vitae elit libero, a pharetra augue.

  1. Vestibulum id ligula porta felis euismod semper.
  2. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus.
  3. Maecenas sed diam eget risus varius blandit sit amet non magna.

Cras mattis consectetur purus sit amet fermentum. Sed posuere consectetur est at lobortis.

HyperText Markup Language (HTML)
The language used to describe and define the content of a Web page
Cascading Style Sheets (CSS)
Used to describe the appearance of Web content
JavaScript (JS)
The programming language used to build advanced Web sites and applications

Integer posuere erat a ante venenatis dapibus posuere velit aliquet. Morbi leo risus, porta ac consectetur ac, vestibulum at eros. Nullam quis risus eget urna mollis ornare vel eu leo.

Tables

Aenean lacinia bibendum nulla sed consectetur. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Name Upvotes Downvotes
Totals 21 23
Alice 10 11
Bob 4 3
Charlie 7 9

Nullam id dolor id nibh ultricies vehicula ut id elit. Sed posuere consectetur est at lobortis. Nullam quis risus eget urna mollis ornare vel eu leo.


Want to see something else added? Open an issue.

What's Jekyll?

Jekyll is a static site generator, an open-source tool for creating simple yet powerful websites of all shapes and sizes. From the project’s readme:

Jekyll is a simple, blog aware, static site generator. It takes a template directory […] and spits out a complete, static website suitable for serving with Apache or your favorite web server. This is also the engine behind GitHub Pages, which you can use to host your project’s page or blog right here from GitHub.

It’s an immensely useful tool and one we encourage you to use here with Lanyon.

Find out more by visiting the project on GitHub.